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Abstract 

Let Xi, . . . , Xn be a collection of iid discrete random variables, and Yi, . . . , Ym a set of noisy ob- 
servations of such variables. Assume each observation Ya to be a random function of some a ran- 
dom subset of the Xi's, and consider the conditional distribution of Xi given the observations, namely 
^i{xi) = P{Ai = (a posteriori probability). 

We establish a general decoupling principle among the A^'s, as well as a relation between the dis- 
tribution of /Xi, and the fixed points of the associated density evolution operator. These results hold 
asymptotically in the large system limit, provided the average number of variables an observation de- 
pends on is bounded. We discuss the relevance of our result to a number of applications, ranging from 
sparse graph codes, to multi-user detection, to group testing. 

1 Introduction 

Sparse graph structures have proved useful in a number of information processing tasks, from channel 
coding [RU07| , to source coding [CSV04| , to sensing and signal processing |Don06[ IEJCT06j . Recently 
similar design ideas have been proposed for code division multiple access (CDMA) communications 
|MT06[|YT06liRS07j , and group testing (a classical technique in statistics) |MT07j . 

The computational problem underlying many of these developments can be described as follows: 
infer the values of a large collection of random variables, given a set of constraints, or observations, that 
induce relations among them. While such a task is generally computationally hard [BMvTTSl IVer89] ). 
sparse graphical structures allow for low-complexity algorithms (for instance iterative message passing 
algorithms as belief propagation) that were revealed to be very effective in practice. A precise analysis of 
these algorithms and of their gap to optimal (computationally intractable) inference is however a largely 
open problem. 

In this paper we consider an idealized setting in which we aim at estimating n iid discrete random 
variables X — (Ai,...,A„) based on noisy observations. We will focus on the large system limit 
n — > cxD, with the number of observations scaling like n. We further restrict our system to be sparse in 
the sense that each observation depends on a bounded (on average) number of variables. A schematic 
representation is given in Fig. [TJ 

If i G and Y denotes collectively the observations, a sufficient statistics for estimating Xi is 

^Jidx^)^nX^^X,\Y}. (1.1) 

This paper establishes two main results: an asymptotic decoupling among the Aj's, and a character- 
ization of the asymptotic distribution of /ii ( • ) when Y is drawn according to the source and channel 
model. In the remainder of the introduction we will discuss a few (hopefully) motivating examples, and 
we will give an informal summary of our results. Formal definitions, statements and proofs can be found 
in Sections [2] to [6l 
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Figure 1: Factor graph representation of a simple sparse observation systems with n = 7 hidden variables 
{Xi, . . . , Xr} and m = 3 'multi-variable' observations {Yi, . . . ,¥3}. On the right: the bipartite graph G. High- 
lighted is the edge {i,a). 



1.1 Motivating examples 

In this Section we present a few examples that fit within the mathematical framework developed in the 
present paper. The main restrictions imposed by this framework are: {i) The 'hidden variables' Xj's 
are independent; (ii) The bipartite graph G connecting hidden variables and observations lacks any 
geometrical structure. 

Our results crucially rely on these two features. Some further technical assumptions will be made 
that partially rule out some of these examples below. However we expect these assumptions to be 
removable by generalizing the arguments presented in the next sections. 

Source coding through sparse graphs. Let {Xi, . . . , X„) be iid Bernoulli(p). Shannon's theorem 
implies that such a vector can be stored in nR bits for any R > h(p) (with h{p) the binary entropy 
function), provided we allow for a vanishingly small failure probability. The authors of Refs. |MurOH 
IMur041 ICSV04j proposed to implement this compression through a sparse linear transformation. Given 
a source realization X — x — {xi, . . . , Xn), the stored vector reads 

y — Mx mod 2 , 

with H a sparse {0, 1} valued random matrix of dimensions m x n, and m = nR. According to our 
general model, each of the coordinates of y is a function (mod 2 sum) of a bounded (on average) subset 
of the source buts (xi, . . . , x„). 

The i-th information bit can be reconstructed from the stored information by computing the condi- 
tional distribution fii{xi) — ¥{Xi — Xi\Y}. In practice, belief propagation provides a rough estimate of 
. Determining the distribution of /ij (which is the main topic of the present paper) allows to determine 
the optimal performances (in terms of bit error rate) of such a system. 

Low-density generator matrix (LDGM) codes. We take {Xi, . . . , Xn) iid Bernoulli(l/2), en- 
code them in a longer vector X' = {X[, . . . ,Xln) via the mapping x' — Ha; mod 2, and transmit the 
encoded bits through a noisy memory less channel, thus getting output (yi, . . . , Ym) [Lub02| . One can 
for instance think of a binary symmetric channel BSC(p), whereby Ya — X'^ with probability 1 — p 
and Ya — X'^ 1 with probability 1 ~ p. Again, decoding can be realized through a belief propagation 
estimate of the conditional probabilities fii{xi) — P{Xi — Xi\Y}. 

If the matrix H is random and sparse, this problem fits in our framework with the information 
(uncoded) bits AT^'s being hidden variables, while the y^'s correspond to observations. 

Low-density parity-check (LDPC) codes. With LDPC codes, one sends through a noisy channel 
a codeword X = (ATi, . . . , Ar„) that is a uniformly random vector in the null space of a random sparse 



2 



matrix H jGal631 IRU07j . While in general this does not fit our setting, one can construct an equivalent 
problem (for analysis purposes) which does, provided the communication channel is binary memoryless 
symmetric, say BSC(p). 

Within the equivalent problem {Xi , . . . , Xn) are iid Bernoulli(l/2) random bits. Given one realization 
X = X of these bits, one computes its syndrome y = Mx mod 2 and transmits it through a noiseless 
channel. Further, each of the Xi's is transmitted through the original noisy channel (in our example 
BSC(p)) yielding output Zi. If we denote the observations collectively as {Y, Z), it is not hard to show 
that the conditional probability fj.i{xi) = P{Xi = Xi\Y, Z} has in fact the same distribution of the a 
posteriori probabilities in the original LDPC model. 

Characterizing this distribution allows to determine the information capacity of such coding systems, 
and their performances under MAP decoding Mon05, KM06, KKM07 . 

Compressed sensing. In compressed sensing the real vector x — {xi, . . . ,Xn) G R" is measured 
through a set of linear projections yi — h"[x, . . . , ym — h^x. In this literature no assumption is made on 
the distribution of x, which is only constrained to be sparse in a properly chosen basis |Don06[ [E JCT06j . 
Further, unlike in our setting, the vector components Xi do not belong to any finite alphabet. However, 
some applications justify the study of a probabilistic version, whereby the basic variables are quantized. 
An example is provided by the next item. 

Network measurements. The size of flows in the Internet can vary from a few (as in acknowledg- 
ment messages) to several million packets (as in content downloads). Keeping track of the sizes of flows 
passing through a router can be useful for a number of reasons, such as billing, or security, or trafBc 
engineering [EV03| . 

Flow sizes can be modeled as iid random integers X = {Xi, . . . ,X„). Their common distribution 
is often assumed to be a heavy-tail one. As a consequence, the largest flow is typically of size for 
some a > 0. It is therefore highly inefficient to keep a separate counter of capacity n° for each flow. It 
was proposed in [LMP07j to store instead a shorter vector Y = MX, with H a properly designed sparse 
random matrix. The problem of reconstructing the X^'s is, once more, analogous to the above. 

Group testing. Group testing was proposed during World War II as a technique for reducing the 
costs of syphilis tests in the army by simultaneously testing groups of soldiers [Dor43j . The variables 
X = {Xi, . . . ,Xn) represent the individuals status (1 = infected, = healthy) and are modeled as iid 
Bernoulli (p) (for some small p). Test a e {!,..., m} involves a subset da C [n] of the individuals and 
returns positive value Ya = I ii Xi = 1 for some i € da and Fq = otherwis^. It is interesting to 
mention that the problem has a connection with random multi-access channels, that was exploited in 
|BMTW84irwn85] . 

One is interested in the conditional probability for the i-th individual to be infected given the 
observations: /ii(l) = P{Xi — 1\Y}. Choices of the groups (i.e. of the subsets da) based on random 
graph structures where recently studied and optimized in [MTOTj . 

Multi-user detection. In a general vector channel, one or more users communicate symbols X = 
{Xi, . . . jXn) (we assume, for the sake of simplicity, perfect synchronization). The receiver is given a 
channel output Y = (Yi, . . . , Y^), that is usually modeled as a linear function of the input, plus gaussian 
noise Y = M.X + W, where W = {Wi, . . . , Wm) are normal iid random variables. Examples are CDMA 
or multiple-input multiple-output channels (with perfect channel state information) [Ver9 8 . TV05^ . 

The analysis simplifies considerably if the AT^'s are assumed to be normal as well [TII99. .VS99j . 
However, in many circumstances a binary or quaternary modulation is used, and the normal assumption 
is therefore unrealistic. The non-rigorous 'replica method' from statistical physics have been used to 
compute the channel capacity in these cases |Tan02) . A proof of replicas predictions have been obtained 
in [MT06| in under some condition on the spreading factor. The same techniques were applied in more 
general settings in |GV05I iGWOBbl IGWOBa) . 

However, specific assumptions on the spreading factor (and noise parameters) were necessary. Such 
assumptions ensured an appropriate density evolution operator to have unique fixed point. The results 
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of the present paper should aUow to prove replica results without conditions on the spreading. 

As mentioned above we shall make a few technical assumptions on the structure of the sparse obser- 
vation system. These will concern the distribution of the bipartite graph connecting hidden variables and 
observations, as well as the dependency of the noisy observations on the X's. While such assumptions 
rule out some of the example above (for instance, they exclude general irregular LDPC ensembles), we 
do not think they are crucial for the results to hold. 

1.2 An informal overview 

We consider two types of observations: single variable observations Z = (Zi, . . . , Z„), and multi- variable 
observations Y = (Yi, . . . , Ym). For each i G [n], Zi is the result of observing Xi through a memory less 
noisy channel. Further for each a, Ya is an independent noisy function of a subset {Xj : j e da} of 
the hidden variables. By this we mean that Ya is conditionally independent from all the other variables, 
given {Xj : j € da}. The subset da C [n] is itself random with, for each i G [n], i Cz da independently 
with probability j/n. 

Generalizing the above, we consider the conditional distribution of Xi, given Y and Z: 

^i,ix,) = F{X,^x,\Y,Z}. (1.2) 

One may wonder whether additional information can be extracted by considering the correlation among 
hidden variables. Our first result is that for a generic subset of the variables, these correlation vanish. 
This is stated informally below: 

For any uniformly random set of variable indices i(l), . . . , i{k) G [n] and any ^i, . . . , ^fc G A" 
V{X,^,)^^i,...,X,^k)^^k\Y,Z}^¥{X,^,)=^i\Y,Z}---¥{X,^k)^^k\Y,Z}. (1.3) 

This can be regarded as a generalization of the 'decoupling principle' postulated in |GV05| . Here the « 
symbols hides the large system (n,m —^ oo) limit, and a 'smoothing procedure' to be discussed below. 

Locally, the graph G converges to a random bipartite tree. The locally tree-like structure of G 
suggests the use of message passing algorithms, in particular belief propagation, for estimating the 
marginals fXi . Consider the subgraph including i as well as all the function nodes a such that Ya depends 
on i, and the other variables these observations depend on. Refer to the latter as to the 'neighbors of i.' 
In belief propagation one assumes these to be independent in absence of i, and of its neighborhood. 

For any j, neighbor of i, let fXj^i denote the conditional distribution of Xj in the modified graph 
where i (and the neighboring observations) have been taken out. Then BP provides a prescriptioro for 
computing /i^ in terms of the 'messages' Hj^i, of the form — F"({/ij^i}). Wc shall prove that this 
prescription is asymptotically correct. 

Let i be a uniformly random variable node and i(l), . . . ,i{k) its neighbors. Then 

Mi ~ ^7i^^^il)~*^: ■ • • ' MiW^i) • (1-4) 

The neighborhood of i converges to a Galton- Watson tree, with Poisson distributed degrees of mean 
"fa (for variable nodes, corresponding to variables X^'s) and 7 (for function nodes corresponding to 
observation Ya's). Such a tree is generated as follows. Start from a root variable node, generate a 
Poisson(7Q;) number of function node descendants, and for each of them an independent Poisson(7) 
number of variable node descendants. This procedure is then repeated recursively. 

In such a situation, consider again the BP equation (jl.4|) . Then the function F"( • • • ) can be approx- 
imated by a random function corresponding to a random Galton- Watson neighborhood, do be denoted 
as F°°. Further, one can hope that, if the graph G is random, then the fii(^j)^i become iid random 



^The mapping F"( • ) returns the marginal at i with respect to the subgraph induced by i and its neighbors, when the latter 
are biased according to For a more detailed description, we refer to Section [2.21 
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variables. Finally (and this is a specific property of Poisson degree distributions) the residual graph 
with the neighborhood of i taken out, has the same distribution (with slightly modified parameters) as 
the original one. Therefore, one might imagine that the distribution of //,,; is the same as the one of 
the ^i(j)^j's. Summarizing these observations one is lead to think that the distribution of /Xj must be 
(asymptotically for large systems) a fixed point of the following distributional equation 

u^f'^{uu...,iyi). (1.5) 

This is an equation for the distribution of v (the latter taking values in the set of distributions over 
the hidden variables Xi) and is read as follows. When Vi, . . . are random variables with common 
distribution p, then F°°(z^i, . . . has itself distribution p (here I and F°° are also random according 
to the Galton- Watson model for the neighborhood of i). It is nothing but the fixed point equation for 
density evolution, and can be written more explicitly as 

p{u e A) = y l(F°°(i/i, ...,vi)&A) p{dvi) ■ ■ ■ p{dvi) , (1.6) 

where !(•••) is the indicator function. In fact our main result tells that: {i) The distribution of jii 
must be a convex combination of the solutions of the above distributional equation; {ii) If such convex 
combination is nontrivial (has positive weight on more than one solution) then the correlations among 
the /Xi's have a peculiar structure. 

Assume density evolution to admit the fixed point distributions pi, . . . ,pr for some fixed r. 
Then there exists probabilities wi,. . . ,Wr (which add up to 1) such that, for . . . i{k) G [n] 
uniformly random variable nodes, 

r 

P{/Xj(i) e Ai, . . . e Ak} w ^ Wa Paii^ G Ai) ■ ■ ■ pa{n e Ak) . (1.7) 

a=l 

In the last statement we have hidden one more technicality: the stated asymptotic behavior might 
hold only along a subsequence of system sizes. In fact in many cases it can be proved that the above 
convex combination is trivial, and that no subsequence needs to be taken. Tools for proving this will be 
developed in a forthcoming publication. 



2 Definitions and main results 

In this section we provide formal definitions and statements. 

2.1 Spairse systems of observations 

We consider systems defined on a bipartite graph G = {V, F, E) , whereby V and F are vertices corre- 
sponding (respectively) to variables and observations ('variable' and 'function nodes'). The edge set is 
E C V X F. For greater clarity, we shall use i,j,k,--- G V to denote variable nodes and a,b,c, - ■ ■ G F 
for function nodes. For i G V, we let di = {a. E F : {i,a) G E} denote its neighborhood (and define 
analogously da for a G F). Further, if we let n = and m = \F\, we are interested in the limit 
n,m ^ 00 with a = m/n kept fixed (often we will identify V = [n] and F = [m]). 

A family of iid random variables {Xi : i G V}, taking values in a finite alphabet X, is associated 
with the vertices of V. The common distribution of the Xi will be denoted by F{Xi = x} = p{x). Given 
[/ C F, we let Xu = {Xi : i G U} (the analogous convention is adopted for other families of variables). 
Often we shall write X for Xy- 

Random variables {Ya : a G F} are associated with the function nodes, with Ya conditionally 
independent of YpXa) ^v\daj given Xga- Their joint distribution is defined by a set of probability 
kernels Q^''^ indexed by A; e N, whereby, for \da\ = k, 

HYa e • \Xda = Xda} = Q^''\-\X0a) ■ (2.1) 
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We shall assume Q^''\ • . . . ^Xk) to be invariant under permutation of its arguments X\,. ..,Xk (an 
assumption that is implicit in the above equation). Further, whenever clear from the context, we shall 
drop the superscript {k). Without loss of generality, one can assume Ya to take values in R*" for some h 
which only depends on k. 

A second collection of real random variables {Zi : i G V} is associated with the variable nodes, with 
Zi conditionally independent of Zy\j, Xy\j and Y, conditional on Xi. The associated probability kernel 
will be denoted by R: 

nz^e ■\X, = x,} = R{-\x,). (2.2) 

Finally, the graph G itself will be random. All the above distributions have to be interpreted as 
conditional to a given realization of G. We shall follow the convention of using P{ • • • }, E{ • • • } etc, 
for conditional probability, expectation, etc. given G (without writing explicitly the conditioning) and 
write Pg{ • • • }, ]Eg{ • • ■ } for probability and expectation with respect to G. The graph distribution is 
defined as follows. Both node sets V and F are given. Further, for any {i, a) G V x F, we let (i, a) G E 
independently with probability Pedge- If we let n = and m = \F\ (often identifying V = [n] and 
F — [m]), such a random graph ensemble will be denoted as G{n,m,pcdgc)- We are interested in the 
limit rt, m — cxD with a — m/n kept fixed and Pcdgc = 7/^- 

In particular, we will be concerned with the problem of determining the conditional distribution of 
Xi given Y and Z cf. Eq. (|1.2p . Notice that fii is a random variable taking values in M{X) (the set of 
probability measures over X). 

In order to establish our main result we need to 'perturb' the system as follows. Given a perturbation 
parameter 6 G [0, 1] (that should be thought as 'small'), and a symbol * ^ A", we let 

7 (ff\ - i {Zi,Xi) with probability 9, , , 

> ~ \ (Z„ *) with probability 1 - 0. ^ 

In words, we reveal a random subset of the hidden variables. Obviously Z[Q) is equivalent to Z and 
Z{\) to X. The corresponding probability kernel is defined by (for A C M measurable, and a; G A" U {*}) 

R\x, A\xi) = [(1 - e)l{x = ^) + ei{x^ x,)]R{A\xi) , (2.4) 

where I( • • • ) is the indicator function. We will denote by jif the analogous of /i^, cf. Eq. (|1.2p . with Z 
being replaced by Z{9). 

It turns out that introducing such a perturbation is necessary for our result to hold. The reason is 
that there can be specific choices of the system 'parameters' a, 7, and of the kernels Q and R for which 
the variables XiS are strongly correlated. This happens for instance at thresholds noise levels in coding. 
Introducing a perturbation allows to remove this non-generic behaviors. 

We finally need to introduce a technical regularity condition on the laws of Ya and Za (notice that 
this concerns the unperturbed model). 

Definition 2.1. We say that a probability kernel T from X to a measurable space S (i.e., a set of 
probability measures T{-\x) indexed by x G X) is soft if: (i) T{-\xi) is absolutely continuous with 
respect to T{ ■ \x2) for any Xi,X2 G X; (ii) We have, for some M < 00, and all x G X (the derivative 
being in the Radon-Nikodyn sense) 

™r,..,,,M^ (2.5, 

A system of observations is said to have soft noise ( or soft noisy observations ), if there exists M < 00 
such that the kernels R and and QC^') for allk>l are M-soft. 

In the case of a finite output alphabet the above definition simplifies considerably: a kernel is soft if 
all its entries are non-vanishing. Although there exist interesting examples of non-soft kernels (see, for 
instance. Section [TTT1) they can often be treated as limit cases of soft ones. 
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2.2 Belief propagation and density evolution 

Belief propagation (BP) is frequently used in practice to estimate the marginals (|1.2p . Messages ff^l^, 

i^a-Li € M{X) are exchanged at time t along edge {i,a) £ where i £ V, a £ F. The update rules 
follow straightforwardly from the general factor graph formalism |KFL01) 

« pix,)R'^{z,\x.) n (2.6) 

bEdi\a 

^^iM) « Q^y<^\^9'^) n ^fLi^i)- (2-7) 

^aa\i j£da\i 

Here and below we denote by oc equality among measures on the same space 'up to a normalizatior[f|.' 
The BP estimate for the marginal of variable Xi is (after t iterations) 



(X,) CX p{x^)R'>{z^\x,)l[4Z^(x^). (2.8) 



bedi 



Combining Eqs. (|2.6p and (|2.8p . the BP marginal at variable node i can be expressed as a function 
of variable-to-function node messages at neighboring variable nodes. We shall write 



■ -' = ^7i{'^}Zh ■■ J^db\t;b£ dt}) , (2.9) 



^^{■■■){x,) (X p(x,)i?''(z.ix.) n ^ E Q(y«i^9a) n (2-10) 

Notice that the mapping Ff(---) only depends on the graph G and on the observations Y,Z(9), 
through the subgraph including function nodes adjacent to i and the corresponding variable nodes. 
Denoting such neighborhood as B, the corresponding observations as Yb, Zq{9), and letting z/p' = 
{^fLb ■ j ^ db\i; b £ di}, we can rewrite Eq. (|2.9p in the form 

= f-i„^^^;B,Ys,Zs{9)). (2.11) 

Here we made explicit all the dependence upon the graph and the observations. If G is drawn randomly 
from the G{n, an, 7/n) ensemble, the neighborhood B, as well as the corresponding observations converge 
in the large system limit, to a well defined limit distribution. Further, the messages {i^j^^} above become 

iid and are distributed as /i,-'"' (this is a consequence of the fact that the edge degrees are asymptotically 
Poisson). Their common distribution satisfies the density evolution distributional recursion 

^(t+i) A f°-(,yW.B,YB,ZBie)), (2.12) 

where = {vi'^ : e e D} are iid copies of and B,Yb,Zb{0) are understood to be taken from 

their asymptotic distribution. We will be particularly concerned with the set of fixed points of the 
above distributional recursion. This is just the set of distributions p over M{X) such that, if v^^^ has 
distribution p, then has distribution p as well. 

2.3 Main results 

For stating our first result, it is convenient to introduce a shorthand notation. For any U ^ V , we note 

Fu{xu} = nXu^xu\Y,Z{e)}. (2.13) 

Notice that, being a function of Y and Z{6), Pu{xu} is a random variable. The theorem below shows 
that, if [/ is a random subset of V of bounded size, then ¥ij factorizes approximately over the nodes 



^Explicitly, qi (x) oc q2 (x) if there exists a constant C > such that qi (x) — 52 {x) for all 
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i € U. The accuracy of this is measured in terms of total variation distance. Recall that, given two 
distributions qi and q2 on the same finite set S, their total variation distance is 

- 92||tv = ^ XI " «2(a;)| . (2.14) 

xes 

Theorem 2.2. Consider an observation system on the graph G — (V, F, E). Let fc G N, and i(l), . . . , i{k) 
be uniformly random in V . Then, for any e > 



5i(i)---i(fc)IE 



■ i{l),...A(k) - ^i(l) • • •^i(fc) 



Ae < {\X\ + l)''An,kVH{Xi)e/n = 0{n-'/^) , (2.15) 



where An^k < exp ^1;^^ for k < n/2, and the asymptotic behavior 0{n ^Z^) holds as n oo with if k 
and X fixed. 



The next result establishes that the BP equation (|2.9p is approximately satisfied by the actual 
marginals. For any i, j g V, such that i, j e db for some common function node b Cz F, let 

fi'J'\x,) = P{X, = : j ^ ^a; Zz(0) : / ^ jj • (2.16) 

This is nothing but the conditional distribution of Xi with respect to the graph from which j has been 
'taken out.' 

Theorem 2.3. Consider a sparse observation system on a random graph G = (V, F, E) from the 
Gnil/n, an) ensemble. Assume the noisy observations to be M-soft. Then there exists a constant A 
depending on t, a, 7, M, \X\, e, such that for any i € V , and any n 

/"EcEII/i^ - fn{l^'/haem.,eaa\^)\\Tv d9 < ^ . (2.17) 
Jo 

Finally, we provide a characterization of the asymptotic distribution of the one variable marginals. 
Recall that M{X) denotes the set of probability distributions over X, i.e., the {\X\ — l)-dimensional 
standard simplex. We further let M^{X) be the set of probability measures over M{X) {M{X) being 
endowed with the Borel (T-field induced by RI'^'"-'^). This can be equipped with the smallest cr-field that 
makes Fa : p p(^) measurable for any Borel subset A of M{X). 

Theorem 2.4. Consider an observation system on a random graph G — (V, F, E) from the Q{n, an, j/n) 
ensemble, and assume the noisy observations to be soft. Let (p : M{X)'^ ^ M. be a Lipschitz continuous 
function on U{Xf = M(A') x • • • x M(A') (k times). 

Then for almost any 6 £ [0, e] there exists an infinite subsequence i?e C N and a probability distri- 
bution Se over M^{X), supported on the fixed points of the density evolution recursion h2.12\) . such that 
the following happens. Given any fixed subset of variable nodes {i(l), • . . , i(fc)} C V 



Jim EgE {<y9(^f(i), . . . ,//f(fe))} = j "^(a^I' ■ ■ ■ ^ l^k) p{A^ll) ■ • •p(dAife)| S{dp) 



(2.18) 



3 Proof of Theorem 12.21 (correlations) 

Lemma 3.1. For any observation system and any e > 

-V f L{X,;X,\Y,Z{0))de <2H{Xi). (3.1) 

Proof. For U CV, let us denote by Z^^\0) the vector obtained by setting zf^\9) = Zi{6) whenever 
i ^ U, and zf^\9) = {Zi, *) if i <E U . The proof is based on the two identities below 

'^^H{X\Y,Z{6)) = -Y,H{X,\Y,Z^^{e)), (3.2) 

iev 

^H{X\Y,Z{0)) = I{X^r,X,\Y,Z(^^He)). (3.3) 
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Before proving these identities, let us show that they imply the thesis. By the fundamental theorem of 
calculus, we have 



- Yl ri{X,;X,\Y,Z^^^HO))dO = i ^ i/(X,|r, Z«(0)) - i J] Z«(e)) (3.4) 

i^jev^" iev iev 

< -J2h{X,\Y,Z(^\Q))<H{X,). (3.5) 
iev 

Further, if z'^'^-'(0) is the vector obtained from z{6) by replacing Zi{6) with {zi, *) for any i £ U, then 

I{X,;X,\Y,Z{0) = z(e)) < I(X,;Xj\Y,Z'^'^\0) = z^'^'> (9)) . (3.6) 

In fact the left hand side vanishes whenever z'^*-''(6') ^ z{9). The proof is completed by upper bounding 
the diagonal terms in the sum dSH) as I{Xi-X,\Y, Z(*)(6')) = H{X,\Y, Z^'^O)) < H{Xi). 

Let us now consider the identities (|3.2p and p.3p . These already appeared in the literature jMMUOSl 
IMMRU051 IMacOT] . We reproduce the proof here for the sake of self-containedness. 

Let us begin with Eq. p.2p . It is convenient to slightly generalize the model by letting the pa- 
rameter the channel parameter 9 be dependent on the variable node. In other words given a vector 
9_ — (6*1, . . . ,9n), we let, for each i E V, Zi{9_) = {Zi,Xi) with probability 9i, and = (Zi,*) otherwise. 
Noticing that H{X\Y,Z{0)) = H{Xi\Y,Z{9)) + H{X\X^,Y, Z{9)) and that the latter term does not 
depend upon 9i, we have 

^H{X\Y,Z{9)) = ^H{X,\Y,Z{9)) = -H{X,\Y, Z^^HO)) , (3.7) 

where the second equality is a consequence of H{Xi\Y, Z{0)) = (1 - e,)H{X^\Y, Z'-'^i)). Equation ([^ 
follows by simple calculus taking 6i = 9i{9) = 9 for all i E V. 

Equation (13. 3p is proved analogously. First, the above calculation implies that the second derivative 
with respect to 9i vanishes for any i G V. For i ^ j, we use the chain rule to get H{X\Y, Z{9)) = 
H{X„ Xj\Y, Z{9)) + H{X\Xi, Xj,Y, Z^'i^i)), and then write 

H{X,,Xj\Y, Z{9)) = (1 - 00(1 - 9j)H{X„Xj\Y, Z^'^\9)) + 9,{1 - 9j)H{X,\X,,Y, Z<-'^H9)) + 

+ {l-9,)9,H{X,\X„Y,Z^'^'>{9_j), 

whence the mixed derivative with respect to 9^ and 9j results in I{Xi; Xj\Y, Z^^^^{9)). As above, Eq. (|3.3p 
is recovered by letting 0i — 9i{9) — 9 for any i £ V. □ 

In the next proof we will use a technical device that has been developed within the mathematical 
theory of spin glasses (see |Tal06] . and [GT041 |GM07| for applications to sparse models). We start by 
defining a family of real random variables indexed by a variable node i £ V, and by ^ e X: 

S,(0 = = 0- nx^ = (\Y, Z{9)} . (3.8) 

We will also use S(^) = (Si(^) . . . , S„(^)) to denote the corresponding vector. 

Next we let = {x['^\ . . . ,X^^^) and X^^) = (xf \ . . . , xi^^ ) be two iid assignments of the hidden 
variables, both distributed according to the conditional law Px\Y,z{e)- If we let (Y, Z{9)) be distributed 
according to the original (unconditional) law ^Y,z(e)j this defines a larger probability space, generated 
by (X(i),X(2),y,Z). Notice that the pair {X'^^\Y,Z) and {X'^'^\Y,Z) is exchangeable, each of the 
terms being distributed as (X, F, Z{9)). 

In terms of X'^^-' and X*^^' we can then define S'^-'(^) and S^^-'(^), and introduce the overlap 

Q(0 ^ i S(i) (0 • S(i) (0 = - 5] S« (0 Sf ) iO . (3.9) 
n n ^ — ' 

iev 

Since |Si(^)| < 1, we have |Q(OI ^ 1 well. Our next result shows that the conditional distribution 
of Q(^) given Y and Z{9) is indeed very concentrated, for most valued of 9. The result is expressed in 
terms of the conditional variance 

Var(Q(0|r, Z{0)) ^ E {E[Q(0'|r, Z{9)] - E[Q(e)|F, Zm^} ■ (3-10) 
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Lemma 3.2. For any observations system and any e > 

e 

Var(Q(0|y,Z(6l)) AO < AH{Xi)/n. (3.11) 



Proof. In order to lighten the notation, write E{ • • • } for E{ • |y = y, Z{0) = 2(6')} (and analogously for 
P{ • • • }), and drop the argument ^ from S,-"''(^). Then 



y \ iev /J I iev 

= {i{sr'sf's«sf }-E{s«sf)}E{s«sf }} 

= {E{S.S,}^-E{S.}^E{S,f} . 



In the last step we used the fact that S^^^ (^) and S^^^ (^) are conditionally independent given Y and Z{6), 
and used the notation Sj(^) for any of them (recall that S(^)(^) and S^^^(^) are identically distributed). 
Notice that 

E{s,(0} = ^[i{x, = 0-nXr = ^\Y,z{e)}\Y = y,z{e) = z{e)]^Q, (3.12) 
E{s,(OS,(0} = e{ [i{x, = 0- nx^ = z{e)}\ [i{x, = e) - P{x, = z{e)\ } = 

= f{X,^i,X,^£}-V{X,^i}nX,^i}. (3.13) 

Therefore 

Var(Q(e)|r = y, Z(6) = z{e)) ^ ^ (^{^' = ^' ^ - = ^W^^ = C})' < 

^ ^T. I{x^■.x,\Y = y,z{e) = z{e))). 

In the last step we used the inequality (valid for any two distributions pi , p2 over a finite set 5) 

E \pi{x)~p2{xf < 2D{p,\\p2) , (3.14) 

X 

and applied it to the joint distribution of Xi and X2, and the product of their marginals. The thesis 
follows by integrating over y and z{9) with the measure PY,z(e) and using Lemma [3Tl □ 

Proof ( Theorem \ 2. 2]} . We start by noticing that, since |Q(OI ^ li and E{Q(^)} = 0, we have, for any 
£.1, ■ ■ ■ ,£,k ^ X, 



|E{Q(a) • • • Q(a)}| < |E{Q(Ci)Q(6)}| < ViE{Q(ei)2}iE{Q(6)n < 

< ivar(Q(a)|y = y,Z = z((?)) + ivar(Q(6)|r = y,Z = z{9)) , 

(where we assumed, without loss of generality, k > 2). Integrating with respect to y and z{0) with the 
measure PY,z{e)j and using Lemma [5721 we obtain 

/ EE{Q{£i)---Q{£k)\Y,Z{e)} d9<m{X,)/n. (3.15) 
Jo 



10 



On the other hand 



E{Q(ei)---Q(efe)} = ^ E IE{S«)(ei)S(;|)(C2)---S«)(a)S;.Jl)(a)}= (3.16) 

" j(i)...3{k)ev 

= ^ E E{S,(i)(ei)---S,w(a)}'> (3.17) 



j(i)...j{k)ev 

- L )^*(i)-«(fc)^'t^Hi)(^i) • • • s«(fc)(ffc)}^ 



(3.18) 



Putting together Eq. p.lSp and p.lSp . letting Bn,k = '^''/^■(fc)' ^^'^ taking expectation with respect to 
Y and Z{e), we get 



E,(i)..,(fe)E{E{S,(i)(ei)---S,(fe)(a)|i^,^(^)}'}d0<4i3„,fci/(Xi)/n, 
which, by Cauchy-Schwarz inequahty, imphes 



(3.19) 



IEi(i)...j(fc)E 



Next notice that 
'i(i)...j(fc) - ]Pi(i) • • •IP'i(fc) 



{ |E{S,(i) (Ci) • • • S,(fe) (a) ^(^)} I } d0 < ^4eB„,fc7J(Xi)/n . 

= ^ E |p.(i)....wUi,---,6}-p,(i){a}---p.w{a} 
:i E |e {1(^.(1) = ei)---i(^.w =e/c)-p.(i){ei}---p»w{efe}} 



(3.20) 



4 E 



E E|[]S.(a)(a)| n 

Je[k], \J\>2 [aeJ ) I3e\k]\j 



I3e[k\\.j 



Using triangular inequality 



i{i)...i{k) - ■ ■ -^iik) 



< 

TV 2 



I E E 



,7G[fe],|,7|>2{{„}„e.j 



Taking expectation with respect to Y, Z{9) and to {i(l), . . . ,«(fc)} a uniformly random subset of V, we 
obtain 



lEi(l)...i(A:)E 



< 



- i(l)...i(fe) - Jl^j(l) • • • ^i(k) 

I J. 

1=2 ^ ^ Ci...?ie^ 
Integrating over 6 and using Eq. (I3.20p . we get 



^(l)...^(^)Ji 



E{S,(i)(ei)---S,(,)(6)|>^,^(e)} 



IE.j(i)...j(fc)E 



— M(l) • ■ ■ M(/c) 



1=2 ^ ^ 



By using i?„^; < the right hand side is bounded as in Eq. (|2.15p . with An.k = \/Bn.k- The bound 
on this coefficient is obtained by a standard manipulation (here we use — log(l — x) < 2x for x € [0, 1/2] 
and the hypothesis k < n/2): 



fe-i 



'k-l 



B„,fc=exp<'-Elog(l-^) \ <exp<!ETrH 



exp 



fc(fc- 1) 



(3.22) 
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hence An^k < exp{k^/2n} as claimed. □ 

Obviously, if the graph G is 'sufficiently' random, the expectation over variable nodes . . . , i{k) 
can be replaced by the expectation over G. 

Corollary 3.3. Let G = (V,F,E) be a random bipartite graph whose distribution is invariant under 
permutation of the variable nodes in V ~ [n]. Then, for any observations system on G — (V, E), any 
fc g N any e > 0, and any (fixed) set of variable nodes {i(l), ■ ■ • jiik), 



E 







d9 < {\X\ + l)''An^kVH{Xi)e/n^ 0{n-^/^) , (3.23) 



where the constant Ak,n is as in Theorem\KM 

4 Random graph properties 

The proofs of Theorems 12 . 31 and 12 .41 relv on some specific properties of the graph ensemble G{n, an, 7/n). 

We begin with some further definitions concerning a generic bipartite graph G = {V,F,E). Given 
i,j e V, their graph-theoretic distance is defined as the length of the shortest path from i to j on G. We 
follow the convention of measuring the length of a path on G by the number of function nodes traversed 
by the path. 

Given i ^ V and t G N we let B{i,t) be the subset of variable nodes j whose distance from i is at 
most t. With an abuse of notation, we use the same symbol to denote the subgraph induced by this set 
of vertices, i.e. the factor graph including those function node a such that da C B{i, t) and all the edges 
incident on them. Further, we denote by B(i,i) the subset of variable nodes j with d{i,j) > t, as well 
as the induced subgraph. Finally D{i,t) is the subset of vertices with d{i,j) — t. Equivalently D{i,t) is 
the intersection of B{i,t) and B{i,t). 

We will make use of two remarkable properties of the ensemble Q{n, na,j/n): (i) The convergence 
of any finite neighborhood in G to an appropriate tree model; (ii) The conditional independence of such 
a neighborhood from the residual graph, given the neighborhood size. 

The limit tree model is defined by the following sampling procedure, yielding a i-generations rooted 
random tree T{t). If t = 0, T{t) is the trivial tree consisting of a single variable node. For t > 1, 
start from a distinguished root variable node i and connect it to / function nodes, whereby I is a Poisson 
random variable with mean 7a. For each such function nodes a, draw an independent Poisson(7) random 
variable ka and connect it to ka new variable nodes. Finally, for each of the 'first generation' variable 
node j, sample an independent random tree distributed as T{t — 1), and attach it by the root to j. 

Proposition 4.1 (Convergence to random tree). Let B{i,t) be the radius-t neighborhood of any fixed 

variable node i in a random graph G = Q(n,an,j/n), and T{t) the random tree defined above. 

Given any (labeled) tree T», we write B(i,t) ~ T, «/ T^, is obtained by the depth-first relabeling of 
B{i,t) following a pre-established conventioi^. Then 

lim P{B(z,t)~T,}=P{Ti~T,}. (4.1) 

n — >oo 

Proposition 4.2 (Bound on the neighborhood size). Let B{i,t) be the radius-t neighborhood of any fixed 

variable node i in a random bipartite graph G = Q{n,an,j/n), and denote by \B{i,t)\ its size (number 
of variable and function nodes). Then, for any A > there exists C{X,t) such that, for any n, M >0 

F{\B{i,t)\ > M} < C{X,t)X-^'^ . (4.2) 

Proof. Let us generalize our definition of neighborhood as follows. If t is integer, we let B{i, t + 1/2) be 
the subgraph including B{i,t) together with all the function nodes that have at least one neighbor in 



*For instance, one might agree to preserve the original lexicographic order among siblings. 
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B(z, t) (as well as the edges to B(z, t)). We also let D(i, t + 1/2) be the set of function nodes that have 
at least one neighbor in B(i, t) and at least one outside. 

Imagine to explore B{i,t) in breadth-first fashion. For each t, \B{i,t + 1/2)| — |B(i,t)| is upper 
bounded by the sum of |D(i, t)| iid binomial random variables counting the number of neighbors of each 
node in D{i,t), which are not in B{i,t). For t integer (respectively, half-integer), each such variables is 
stochastically dominated by a binomial with parameters na (respectively, n) and 'y/n. Therefore |B(i,t)| 
is stochastically dominated by J2f=Q ^n{s), where {Z„(t)} is a Galton- Watson process with offspring 
distribution Binom(n, 7/n) and 7 = 7 max(l, a). 

By Markov inequality 

F{\B{t,t)\>M}<g-,{X)X-'' , gnX)^E{X^'^-oZAs^, 

By elementary branching processes theory 5" (A) satisfies the recursion (jrJYi('^) — ^^61 (5? (A)), g^iX) — X, 
with ^„(A) = A(l + 27(A - l)/n)". The thesis follows by ^("(A) < gt{X), where gt{X) is defined as ^^"(A) 
but replacing ^„(A) with C(A) = e^'^^^'^^ > C„(A). □ 

Proposition 4.3. Let G = {V,F,E) he a random bipartite graph from the ensemble Q{n,m,p). Then, 
conditional on B(i,t) — {V{i,t), F{i,t), E{i,t)), B(i,t) is a random bipartite graph on variable nodes 
V \ V{i, t — 1), function nodes F \ F{i, t) and same edge probability p. 

Proof. Condition on B{i,t) = {V{i,t), F{i,t), E{i,t)), and let B(r,i-1) = {V{i,t-1), F{i,t-1), E{i,t- 
1)) (notice that this is uniquely determined from B{i,t)). This is equivalent to conditioning on a given 
edge realization for any two vertices k, a such that k G V{i,t) and a G F{i, t). 

On the other hand, B{i,t) is the graph with variable nodes set V = V \ V{i,t ~ 1), function nodes 
F = F \ F{i,t), and edge set (fc,a) € G such that k £ V, a £ F. Since this set of vertices couples is 
disjoint from the one we are conditioning upon, and by independence of edges in G, the claim follows. 
□ 



5 Proof of Theorem 12.31 (BP equations) 



The proof of Theorcm l2.3l hingcs on the properties of the random factor graph G discussed in the previous 
Section as well as on the correlation structure unveiled by Theorem [ 



5.1 The effect of changing G 

The first need to estimate the effect on changing the graph G on marginals. 

Lemma 5.1. Let X be a random variable taking values in X and assume X G ~> Yi B and 

X —t G ^ Yi B to be Markov chains (here G, Y1.2 and B are arbitrary random variables, where G 
stands for good and B for badj. Then 

E||P{Xe ■\Yi}-¥{X£ ■\Y2}\\^^<2¥.\\¥{X£ • |G} - P{X e -iBlll^^. (5.1) 

G ^ Y ^ B. Then, by convexity of the total 



Proof. First consider a single Markov Chain X 
variation distance. 



E||P{Xe ■\Y}-¥{X £ ■\B}\ 



= E 

< E 

= E 



E 



The thesis is proved by applying this bound to both chains X 
and using triangular inequality. 



|p{x e • |G, Y} r| - ¥{x e • \b} 
{X£ .|G,r}-P{Xe = 
{Xe ■\G}-nxe -ISIIItv ■ 

G -> Yi -> B and X ^ G 



< 



(5.2) 

(5.3) 
(5.4) 

□ 



The next lemma estimates the effect of removing one variable node from the graph. Notice that the 
graph G is non-random. 
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Lemma 5.2. Consider two observation systems associated to graphs G — (V, E) and G' = {V' , F' , E') 
whereby = V"' \ {j}, F ^ F' and E — E' \ {(j, b) : b Cz dj}. Denote the corresponding observations as 
(Y, Z{6)) and {Y' , Z'{9)). Then there exist a coupling of the observations such that, for any i G V : 

E||P{X,e ■\Y,Z{e)}-F{X,e ■\Y',z\e)}\\^y< (5.5) 



4E 



\o^j{---\Yp\oj,z{e)}~ n M-\YF\oj,zie)}\ 



where d'^j = {I e V : d{i, I) = 1} and used the shorthand¥u{- ■ ■ \Yp\Q^,Z{9)} forF{Xu e ■■■ \Yp\Q^, Z{d)}. 

The coupling consists in sampling X = {Xi : i € V} from its (iid) distribution and then {Y,Z{6)) 
and {Y',Z'{6)) as observations of this configuration X, in such a way that Z{&) = Z'(&) and Ya — Y^ 
for any a Cz F such that da Cz V . 

Proof. Let us describe the coupling more explicitly. First sample X = {X^ : i G V} and X' — {X'- : i G 
V'} in such a way that X^ = X[ for any i G V. Then, for any i G V, sample Zi{9), Z[{9) conditionally 
on Xi = X'^ in such a way that Zi{6) = Z'^{6). Sample Zj{9) conditionally on Xj. For any a G F such 
that da G V, sample Yq, conditionally on Xga = X'g^ in such a way that Ya — Y^. Finally for a G dj, 
sample Ya, Y^ independently, conditional on Xga ^ X'g^ 
Notice that the following are Markov Chains 

X, ^ (Xg.,, y, z{0)) ^ (r, z{9)) ^ , (5.6) 

X, -> {Xo2j,Y, Z{9)) ^ (r', Z'{9)) ^ {Yf\qj, Z[B)) . (5.7) 

The only non-trivial step is (^^2^ , Y, Z(0)) {Y' ,Z' (ff)). Notice that, once Xqi^ is known, Yq^ is 
conditionally independent from the other random variables. Therefore we can produce {Y' , Z'{9)) first 
scratching Yqj, then sampling Xj independently, next sampling Yqj and Zj{9) and finally scratching 
both Xj and XQ2j . 

Applying Lemma IS.ll to the chains above, we get 

E\\F{X,G ■\Y,Z{9)}~F{X,G ■\Y',Z'{9)}\\< 



I TV 

<2E||P{X, e ■\Yp\9^,Z (9)} ^F{X,G ■\X92^,Y,Z{9)}\ 



TV 



= 2E||P{X, e ■\Yp\e^,Zi9)}-F{X,G ■ \Xe2^,Yp\e^, Zi9)}\\^^ , 

where in the last step, we used the fact that Ygj is conditionally independent of Xi, given XQ2j. The 
thesis is proved using the identity (valid for any two random variables U, W) 

E||P{[/e - j-FiUG ■\W}\\Ty^\\F{{U,W)G ■■■}-F{Ug - jFiWG -JUtv, (5.8) 

and the bound (that follows from triangular inequality) 

\\F{{U,Wi...Wk) G •••}-P{;7e ■■■}F{iWi...Wk)G •••}||tv< 

<2\\F{{U,Wi...Wk)G ■■■}-F{Ug •••}P{W^ie - y-'-FiWhe •}||tv 



□ 



An analogous Lemma estimates the effect of removing a function node. 



Lemma 5.3. Consider two observation systems associated to graphs G = (V, F, E) and G' = {V , F' , E') 
whereby V = V , F = F' \ {a} and E — E' \ {{j, a) : j G da}. Denote the corresponding observations as 
(Y, Z{9)) and {Y' , Z'{9)), with Z{9) = Z'{9) and Y = Y'\ {a}. Then, for any i G V: 

E\\F{X,G ■ \Y,Z (9)} -F{X,G ■\Y',Z'{9)}\\tv< (5.9) 

4E\\Fi,9a{---\Yp\da.Zi9)}- n ^i{-\YF\aa,Z{9)}\ 

le{i,da} 

where we used the shorthand Fij{- ■ ■ [Yp^ai Z{9)} for F{Xij G ■ ■ ■ |Yf\^, Z[9)}. 
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Proof. The proof is completely analogous (and indeed easier) to the one of Lemma 15.21 It is sufficient 
to consider the Markov chain X, {XQa,Y,Z{e)) {Y,Z{e)) -> {YF\a,Z{e)), and bound the total 
variation distance considered here in terms of the first and last term in the chain, where we notice that 
(Yf\q, Z{e)) = {¥', Z'{9)). We omit details to avoid redundancies. □ 

Next, we study the effect of removing a variable node from a random bipartite graph. 

Lemma 5.4. Let G — {V, F, E) and G' = {V , F' , E') be two random graphs from, respectively, the 
Q{n~ l,an,j/n) and Q(n,an,"f/n) ensembles. Consider two information systems on such graphs. Let 
{Y,Z{6)) and {Y',Z'{9)) be the corresponding observations, and /i^, the conditional distributions of 
Xi in the two systems. 

It is then possible to couple G to G' and, for each 6 {Y,Z{6)) to {Y' , Z'{9)) and choose a constant 
C = CdA"!, a, 7) (bounded uniformly for 7 and 1/a bounded), such that, for any e > and any i G VnV , 

5G]E||A*f-/iniTv< (5.10) 

Further, such a coupling can be produced by letting V' — VU {n}, F' — F and E' = ElJ {(n, a) : a £ dn} 
where a € dn independently with probability j/n. Finally {Y,Z{6)) and (Y' , Z'{9)) are coupled as in 
Lemma\5.S\. 



Proof. Take V = [n ~ \\, V' — [ri\, F — F' — [na] and sample the edges by letting, for any i £ [n — 1], 
(i, a) £ E \i and only if (i, a) e E' . Therefore E = E' \ {{n,a) : a € dn] (here dn is the neighborhood of 
variable node n with respect to the edge set E'). Coupling (F, Z{9)) and {Y' , Z'{9)) as in Lemma [521 
and using the bound proved there, we get 



rEGE||/i^-^^'||Tv<4 TegE P,,a.„{...|y^\a„,Z(0)}- [] ¥i{-\Yp^^Q^, Z{9)}\ 

Jo Jo ,^ r, 32^1 



d9, 

TV 



(5.11) 



In order to estimate the total variation distance on the right hand side, we shall condition on \dn\ 
and |9^n|. Once this is done, the conditional probability Pi^92„{ • • • \Yp\gn, Z{9)} is distributed as the 
conditional probability of variables, in a system G{\dn\) with n — 1 variable nodes and na — \dn\ 

function nodes. Let us denote by (Y, Z{9)) the corresponding observations (and by P, E probability and 
expectations). Then the right hand side in Eq. (j5.1H) is equal to 

^ I ^^n\,\^^-n\&G[^\f^,^-n{■■■\YF\an,Z{9)}- J] ¥i{ ■ \Yp\o^, Z {9)}\\\ \\dn\,\dM]^S = 



< 4E 



/ Eia„|,|a2„|Eg(|g„|)E| |Pi...|a2„|+i{ • • • |r, Z{9)} - \Y, Z{9)}\ 

•^0 1= 

dn\,\d-n\ + l)ia^«+ilVi/(Xi)e/(n-l)} + 4eP{\d^n\ + 1 > 



d9 < 

TV 



In the last step we applied CoroUarv 13 . 31 and distinguished the cases |9^n| + 1 > y/n/10 (then bounding 
the total variation distance by 1) and |9^n| + 1 < y/n/lO (then bounding An-i^\d2n\+i by y/2 thanks 
to the estimate in Theorem 12. 2p . The thesis follows using Proposition 14.21 to bound both terms above 
(notice in fact that \d'^n\ < \B{i, 1)|). □ 

Again, an analogous estimate holds for the effect of removing one function node. The proof is omitted 
as it is almost identical to the previous one. 

Lemma 5.5. Let G = {V, F, E) and G' — {V , F' , E') be two random graphs from, respectively, the 
Q{n,an — 1,7/n) and Q{n,an,j/n) ensembles. Consider two information systems on such graphs. Let 
{Y,Z{9)) and {Y',Z'{9)) be the corresponding observations, and ^f, ^f' the conditional distributions of 
Xi in the two systems. 
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It is then possible to couple G to G' and, for each 9 {Y,Z{0)) to (Y' , Z'{9)) and choose a constant 
C = C(| A"!, a, 7) (bounded uniformly for 7 and 1/a bounded), such that, for any e > and any i G VOV , 

r EaEWl^f - ^^r\\Tv < ■ (5.12) 
Jo Vn 

Further, such a coupling can be produced by letting V' = V , F' = F \ {a}, for a fixed function node a, 
and E' = E (J {{j, a) : j da} where j S da independently with probability 7/n. Finally {Y, Z{9)) and 
(Y' , Z'{9)) are coupled as in Lemma \5.3l 

5.2 BP equations 

We begin by proving a useful technical Lemma. 

Lemma 5.6. Let pi, p2 be probability distribution over a finite set S, and q : S x S R+ be a 
non-negative function. Define, for a — 1,2 the probability distributions 

Then 

Pl-P2 TV<2 ^-2 — =r^^-- P1-P2 TV. 5.14) 

Proof. Using the inequality |(ai/5i)- (02/62)! < |ai-a2|/6i + (a2/62)|6i-62|/6i (valid for oi , 02 , 5i , 62 > 
0), we get 



T,yiix^y)\piiy) -P2{y)\ T,yii^^y)P2{y) 



E.',y'ii^',y')ipiiy')-P2iy')) 



T,x',y'ii^'^y')piiy') T,x',y'ii^'^y')P2iy') T,x',y'ii^'^y')piiy') 

Summing over x we get 



IIPI -P2||tv < 



Ey(Ex9(a;,2;))bi(y) -P2(2/)| 



Ey(E.'9(2^',y'))pi(y') ' 

whence the thesis follows. □ 

Given a graph G, i G V, t > 1, we let B = B(i, t), B = B(i, t) and D = D(i, t), Further, we introduce 
the shorthands 

Wb = {Ya-. da CB, da ^D}U{Zi: ieB\D}, (5.15) 
V% = {Ya : daCB}U{Zi: i eB} . (5.16) 

Notice that Wb, Wg form a partition of the variables in Y,Z{9). Further Wb, Wg are conditionally 
independent given Xq. As a consequence, we have the following simple bound. 

Lemma 5.7. For any two non-negative functions f and g, we have 

E{fiWB)giW^)}<maxE{f{WB)\XD=XD}E{g{W^)}. (5.17) 

XB 

Proof. Using the conditional independence property we have 

E{/(VFB)g(M%)} = E{E[/(W^b)|^d]IE[.9(V%)|Xd]} < maxE{/(W^B)|^D = xo}E{E[g{W-^)\XD]} , 
which proves our claim. □ 
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It is easy to see that the conditional distribution of {Xi, Xq) takes the form (with an abuse of notation 
we write V{Xu\ ■■■} instead of V{Xu ^ xu\ ■■■}) 

WX X\Y7(0)\ nX^,Ws\Xo,W^}P{Xo\W^} 

F{X.,Xo\Y,ZiO)} j:^,^,nxl,Ws\X',,W^}F{Xi\W^}- ^''''^ 

F{X.„Ws\Xo}nXD\W^} ^j.^g^ 



Exix^,nKWBmnx'o\w^} • 

If B is a small neighborhood of i, the most intricate component in the above formulae is the probability 
P{XD|Wg}. It would be nice if we could replace this term by the product of the marginal probabilities 
of Xj, for j E D. We thus define 

¥{X„Wb\Xd} U.^dHXAWt^} 

Notice that this is a probability kernel, but not a conditional probability (to stress this point we used 
the double separator ||). 

Finally, we recall the definition of local marginal )xf{xi) and introduce, by analogy, the approximation 

{■) 

f^tM = J2 = ^d\Y, Z{B)} , txl'\xi) = J2 = X,, Xd\Y, Z{0)} . (5.21) 

Xo Xd 

It is easy to see that, for t — 1, /i^'* is nothing but the result of applying belief propagation to the 
marginals of the neighbors of i with respect to the reduced graph that does not include i. Formally, in 
the notation of Theorem 12.31 

= Fn{^^J-.a}a^m,,eaa\^){x^) ■ (5.22) 

The result below shows that indeed the boundary condition on Xq can be chosen as factorized, thus 
providing a more general version of Theorem 12.31 



Theorem 5.8 (BP equations, more general version). Consider an observations system on a random 
bipartite graph G = (V, F, E) from the Q(n,an,"//n) ensemble, and assume the noisy observations to be 
M-soft. Then there exists a constant A depending on t,a,"f,AI, \X\,e, such that for any i eV, and any 
n 

EGE||/.f-^f*||Tvd0< A. (5.23) 



Proof. Notice that the definitions of /i^, /i, ' have the same form as pi, p2 in Lemma 15.61 whereby x 
corresponds to Xi and y to Xq- We have therefore 



k'-A^f'*||Tv<2 



/ max^„ P{I^b|^d = xp} 
\mm,^F{WB\XD ^ xd} 



^{Xo = ■ |I%} - n = ■ . (5.24) 

" J.J. TV 



Given observations Z{9) and U C V, let us denote as (i{Z{6),U) the values of xjj such that, for any 
i G U with Zi{9) = (Zi,Xi) with cc^ ^ *, one has Xi — x^ (i.e. the set of assignments xu that are 
compatible with direct observations). Notice that the factor in parentheses can be upper bounded as 

max^D ExB\D HWb\Xb = xb} ^ max^^ rRa^x^^^€£iz{e).B\D) PII^bI^b xb} _ 
ininj;^ ExB^D^'t^Bl^B = a;B} ~ 'aim^^m.m^^^^^^^z(e),B\D)P{WB\XB ^ xb} 

^ max^3se(z(g),B\D) P{VFb|^b = xb} ^g) 
~ min^ggg;(z(9),B\D) P{VKb|Xb = .tb} ' 
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Using Lemma [5771 to take expectation with respect to the observations {Y, Z{9)), we get 



E||M^-A^ri|Tv<C(B)] 



^{Xd= ■\W-g}-Y[¥{X,= -IW^} 



(5.27) 



where (with the shorthand P{ • |x£/} for P{ • \Xij — X£/}, and omitting the arguments from £, since they 
are clear from the context) 



C(B) = 2maxE. 

< 2maxE. 

< 2maxE' 



Xr 



' max,„ PIW^b^d} 
,min^„ P{M^B|a:D} 
' max^ggePlW^Bl^B = a^B} 
^min,,ecP{l¥B|^B = a;B} 

max^a^ P{ya|a;aa} 



< 



< 



n 



n 



Xb - : 
max^^ecP{Z,(6l)|a;J 



mina;3„ P{Fa|a;sa} min^^gg P{Z,(6')|a;J 



< 



JJ^ maxE 



aGB 



maxj;a^ ¥{Ya\xoa} 



F{Ya\xOa} 



Xda = a^aa f II ™^xE 
eB\D 




In the last step we used the hypothesis of soft noise, and before we changed Zi{d) in Zi because the 
difference is irrelevant under the restriction x € £, and subsequently removed this restriction. 
We now the expectation of Eq. (|5.27[) over the random graph G, conditional on B 



{Elk' 



b1 < Afl^lEr 



E 



^xd - • ~ n = • i^b} 

jeD 



(5.28) 



Notice that the the conditional expectation is equivalent to an expectation over a random graph on 
variable nodes {V \ V^(B)) U D, and function nodes F \ F{B) (where V{B) and F{B) denotes the variable 
and function node sets of B). The distribution of this 'residual graph' is the same as for the original 
ensemble: for any j E {V \ V^(B)) U D and any 6 e F \ F{B), the edge (j, b) is included independently 
with probability 7/n. We can therefore apply Corollarv l3.3l 



Ef 



{E|k 



/^»' IItv 



b} d9 < M\^\i\X\ + l)\^\A^_\s\MVHiXi)e/in-\B\). 



(5.29) 



We can now take expectation over B — B{i,t), and invert expectation and integral over 9, since the 
integrand is non- negative and bounded. We single out the case |B| > y/n/10 and upper bound the total 
variation distance by 1 in this case. In the case |B| < we upper bound ^|B|.n-|B| by \/2 and lower 

bound n - |B| by n/2, thus yielding, for M = M{1 + X): 



f EGE|k-/lf*||Tvd^? < ^4H{Xi)e/nE{M^^^} +P{\B\ > . 
Jo 



The thesis follows by applying Proposition l4.2l to both terms 



(5.30) 
□ 



6 Proof of Theorem 12.41 (density evolution) 

Given a probability distribution 5 over M^{X), we define the probability distribution Ps,k over M{X) x 
■■■ xM{X) (k times) by 

P5,fe {(a*i, . . . , /ife) = I p'iA) S{dp) . (6.1) 

where is the law of k iid random /i's with common distribution p. We shall denote by Es,k expectation 
with respect to the same measure. 
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Lemma 6.1. Let G — (V, F, E) he any bipartite graph whose distribution is invariant under permutation 
of the variable nodes in V — [n], and • ) = F{Xi = 'l^, Z} the marginals of an observations system 
on G. Then, for any diverging sequence i?o C N there exists a subsequence R and a distribution S 
on M^(A') such that, for any subset {i(l), . . . , i(fc)} C [n] of distinct variable nodes, and any bounded 
Lipschitz function : M(A')*'' — > R; 

lim EGE{(/3(Aii(i), . . . = Es,fc{¥?(Mi, • ■ • • (6.2) 

Proof. We shall assume, without loss of generality, that Rq — N. Notice that {fj,i, . . . , is a family of 
exchangeable random variables. By tightness, for each i = 1, 2, . . . , there exists a subsequence Ri such 
that {fii, . . . , iii) converges in distribution, and Ri+i C Construct the subsequence R whose j-th 
element is the j-th element of Rj. Then for any fc, . . . ,t^i(k)) converges in distribution along R 

to an exchangeable set \ ■ ■ ■ Further the projection of the law of \ ■ ■ ■ ,^1*^^) on the first 

fc — 1 variables is the law of {fi^^ ^\ . . . , /J.^^Li^'')- Therefore, this defines an exchangeable distribution over 
the infinite collection of random variables {^i : i = 1,2, . . .}. By de Finetti, Hewitt-Savage Theorem 
|dF69|, IHS55j there exists S such that, for any fc, the joint distribution of (/ii, . . . , /ifc) is Ps,k- In 
particular 

limEG]E{<^(Ati(i), . . ■ ,/ii(fc))} = IE{<p(/^i, • • ■ ,Mfc)} = ^S.kWiP'i^ ■ ■ ■ :Mfc)} ■ 

□ 



Proof. [Main Theorem] By Lemma [Q] Eq. (|2.18p holds for some probability distribution Sg on M^(A'). 
It remains to prove that Sg is supported over the fixed points of the density evolution equation (|2.12p . 

Let (p : M{X) ^ M be a test function that we can assume, without loss of generality bounded by 1, 
and with Lipschitz constant 1. Further, let D{i) = D(i, 1) and I^^q^^^ = {/^j^*''; J ^ '-'(i)}. By Theorem 
2.31 together with the Lipschitz property and boundedness, we have 



d0 < 



(6.3) 



Fix now two variable nodes, say i — 1 and i — 2. Using Cauchy-Schwarz, this implies 
\EaE { [^(m?) - ^{fMS))] [M) - ^(F?(/dS))] }| < ^ . 
Applying dominated convergence theorem, it follows that, for almost all Be [0,e], 
Jim EgE {[^(m?) -^(F?(/„'g))] [p{^,',)-^{F^{„'^g]))]]^0. 



(6.4) 



By Lemma l6.H we can find a sequence Rg, and a distribution Sg over M(A')2, such that Eq. ([62]) holds. 
We claim that along such a sequence 



limEGE{^(/iOv'(Mi)} = E5,.2{¥'(mi)^(M2)}, 



nei? 



limEGE{^(/4)^(F^(/o(|))} =EEseM+i{ip{t^i)v{f°°{f^2, ■ ■ ■ , 
InnEcE [ip{f-,{f,'^ll]))^{Fi{^i'^l^{))} =EEs,M+^^^^ 



(6.5) 
(6.6) 

' Mfci+fcs))} • 

(6.7) 



Here the expectations on the right hand sides are with respect to marginals /ii,/i2,... distributed 
according to Pse,- (this expectation is denoted as Egg,.) as well as with respect to independent random 
mappings F°° : M(A')* — > M{X) defined as in Section \T2\ cf. Eq. (|2.12p (this includes expectation with 
respect to fc, fci, fc2 and is denoted as E). 
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Before proving the above limits, let us show that they implies the thesis. Substituting Eqs. (|6.5[) to 
6.7[) in Eq. (|6.4[) and re-ordering the terms we get 



A{py Seidp) = , (6.8) 
A(p)= /(^(^)p(dM)-E /^(F-(/xi,--- ,Mfc))p(dMi)---p(dMfc)- (6.9) 



Therefore A(p) = S'e-almost surely, which is what we needed to show in order to prove Theorem [ 

Let us now prove the limits above. Equation ()6.5|) follows is an immediate consequence of Lemma l6. II 
Next consider Eq. (16. 6p . and condition the expectation on the left-hand side upon B(i = 2,t=l) = B, 
as well as upon Wb, cf. Eq. (|5.15p . First notice that, by Lemma [HH] 

EgE{|K-mP)||xv|B,M/b} < , ^ . (6.10) 

v/n- |B| 

and condition the expectation on the left-hand side upon B(j, 1) — B, as well as upon Wb, cf. Eq. (|5.15p . 
As a consequence, by Lipschitz property and boundedness of ip 



EGE{p{^,t)v^{fU^'Di2))\^'WB}-EGM{vipl'^^^^ <^===. (6.11) 



c_ 



(2) 

In the second term the p^ are independent of the conditioning, of the function (which is deter- 
ministic once B, Wb are given). Therefore, by Lemma 16.11 (here we are taking the limit on the joint 
distribution of the fj,j , but not on F2; to emphasize this point we note the latter as F2*) 

lim EGE{^(Aif(2V(Fr(/D(2))|B,W^B} = Es,fcMMi)¥'(F^- (^^2, • ■ • , Mi+|d(2) I))} • (6.12) 

(Notice that the graph whose expectation is considered on the left hand side is from the ensemble 
Q{n — |y(B)|, an — |i^(B)|, 7/n). The limit measure Sg could a priori be different from the one for the 
ensemble G{n,an,"f/n). However, Lemmas 15. 4[ 15.5] imp Iv that this cannot be the case.) 
By using dominated convergence and Eq. (|6.1ip we get 

hin EgE {p{fil)^{F^' (/og})} = Eb.Wb ( lim EgE {^{p<i)p{F^' (/d(2) ) |B, Wb} 

= EB,WBEs,fc{<^(A*l)<P(F2*(Ai2, ■ • • ,Ml+|D(2)l))} • 

Finally we can take the limit — > 00 as well. By local convergence of the graph to the tree model, we 
have uniform convergence of F2* to F°° and thus Eq. (|6.6p . 

The proof of Eq. (|6.7p is completely analogous to the latter and is omitted to avoid redundancies. □ 
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